Programs based on hash tables and Burrows-Wheeler are very fast for mappingshort reads to genomes but have low accuracy in the presence of mismatches andgaps. Such reads can be aligned accurately with the Smith-Waterman algorithmbut it can take hours and days to map millions of reads even for bacteriagenomes. We introduce a GPU program called MaxSSmap with the aim of achievingcomparable accuracy to Smith-Waterman but with faster runtimes. Similar to mostprograms MaxSSmap identifies a local region of the genome followed by exactalignment. Instead of using hash tables or Burrows-Wheeler in the first part,MaxSSmap calculates maximum scoring subsequence score between the read anddisjoint fragments of the genome in parallel on a GPU and selects the highestscoring fragment for exact alignment. We evaluate MaxSSmap's accuracy andruntime when mapping simulated Illumina E.coli and human chromosome one readsof different lengths and 10\% to 30\% mismatches with gaps to the E.coli genomeand human chromosome one. We also demonstrate applications on real data bymapping ancient horse DNA reads to modern genomes and unmapped paired readsfrom NA12878 in 1000 genomes. We show that MaxSSmap attains comparable highaccuracy and low error to fast Smith-Waterman programs yet has much lowerruntimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMapwith high accuracy and low error much faster than if Smith-Waterman were used.On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have loweraccuracy compared to at higher lengths. On real data MaxSSmap produces manyalignments with high score and mapping quality that are not given by NextGenMapand BWA. The MaxSSmap source code is freely available fromhttp://www.cs.njit.edu/usman/MaxSSmap.
展开▼